Context
The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.
The app allows restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.
Objective
The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company improve its business.
The data includes various information related to a food order. A detailed data dictionary is provided below.
Data Dictionary
# import libraries for data manipulation
import pandas as pd
# import libraries for data visualization
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
# misc/other imports
import warnings
# import libraries for python utils
import tabulate
# GLOBAL OPTIONS ---
# To avoid clutter in the output, suppress warnings
warnings.filterwarnings("ignore")
# Set global figure size for all plots
plt.rcParams["figure.figsize"] = (15, 6)
plt.rcParams["font.size"] = 12
# Set the grid style globally (for both major and minor grids)
# plt.rcParams['axes.grid'] = True
plt.rcParams["grid.linestyle"] = "--"
plt.rcParams["grid.alpha"] = 0.6
plt.rcParams["grid.color"] = "#E2E0E0FF"
# Set global font sizes for title, xlabel, and ylabel
plt.rcParams["axes.titlesize"] = 16 # Default font size for plot titles
plt.rcParams["axes.labelsize"] = 12 # Default font size for x and y labels
# Seaborne style ---
# Set global Seaborn style and context
# ref: https://r02b.github.io/seaborn_palettes/
sns.set_theme(style="whitegrid", palette="muted", context="notebook")
# UTILS (plot) ---
def set_fig_size(width, height):
"""
Set figure size (for upcoming plot only)
NOTE: this will apply to next plot only (& not all plot)
"""
plt.figure(figsize=(width, height))
def set_small_fig():
"""
(5, 5) -> ar: 1
Square shaped fig (eg: scatter plot)
"""
set_fig_size(5, 5)
def set_medium_fig():
"""
(9, 6) -> ar: 1.5
Normal | Medium
"""
# alternate = (12, 8)
set_fig_size(9, 6)
def set_large_fig():
"""
(12, 6) -> ar: 2
Large figure
"""
set_fig_size(12, 6)
def set_vlarge_fig():
"""
(13, 5) -> ar: 2.5
Very Large figure (Eg Time Series covering long span)
"""
set_fig_size(15, 6)
# UTILS (PANDAS) ---
def tableit(series: pd.Series, index_label: str = None, value_label: str = "value"):
"""
Display a pandas Series as a formatted table using the tabulate library.
Args:
series (pd.Series): The pandas Series to be displayed.
index_name (str): The name of the index column. Default to `series.name`
Returns:
None
"""
i_name = index_label if index_label else series.name
table = tabulate.tabulate(
series.items(), headers=[i_name, value_label], tablefmt="grid"
)
print(table)
# todo: decide default value for show_index
def tableit_df(df: pd.DataFrame, show_index: bool = True):
"""
Display a pandas DataFrame as a formatted table using the tabulate library.
Args:
df (pd.DataFrame): The pandas DataFrame to be displayed.
show_index (bool): Whether to display the DataFrame index. Default is True.
Returns:
None
"""
# Convert DataFrame to a tabulated string
table = tabulate.tabulate(
df, headers="keys", tablefmt="pretty", showindex=show_index
)
print(table)
# UTILS (Python | Primitive) ---
def snake_to_pascal(snake_str, join_with=" "):
"""Convert snake_case to PascalCase (eg my_name -> MyName)
Args:
snake_str (str): string to convert
join_with (str): character to join the components, default is space
"""
components = snake_str.split("_")
return join_with.join(x.title() for x in components)
def format_pct(val):
"""Format a val as percentage i.e max 2 decimal value & adding % at the end"""
return f"{val:.2f}%"
# UTILS (Python | Stats) ---
def calc_iqr(series: pd.Series):
"""
series: array of numerical values
"""
Q1 = series.quantile(0.25)
Q3 = series.quantile(0.75)
IQR = Q3 - Q1
return Q1, Q3, IQR
def to_percentage(value):
"""value is expected to be a normalized float value in [0, 1]"""
return format_pct(value * 100)
⚡ Note
Please note that some lines in this notebook may appear as commented-out code (e.g.,
print()/repr()) in favor of usingtableit()/tableit_df()for more structured outputs. These comments are intentionally left in place so that, in case the notebook is exported to a different format and the display results do not appear as expected, they can be easily reverted to the default behavior (e.g., using repr() for outputs in the original notebook).
# uncomment and run the following lines for Google Colab
# from google.colab import drive
# drive.mount('/content/drive')
# Load the dataset
df = pd.read_csv('foodhub_order.csv')
# Make a copy of the original DataFrame (to have a original copy of the data)
original_df = df.copy()
# View the first 5 rows
df.head(5)
| order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1477147 | 337525 | Hangawi | Korean | 30.75 | Weekend | Not given | 25 | 20 |
| 1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.08 | Weekend | Not given | 25 | 23 |
| 2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5 | 23 | 28 |
| 3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3 | 25 | 15 |
| 4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4 | 25 | 24 |
🔍 Observation:
rows, cols = df.shape
rows, cols
(1898, 9)
🔍 Observation:
tableit(df.dtypes, index_label='column')
+-----------------------+---------+ | column | value | +=======================+=========+ | order_id | int64 | +-----------------------+---------+ | customer_id | int64 | +-----------------------+---------+ | restaurant_name | object | +-----------------------+---------+ | cuisine_type | object | +-----------------------+---------+ | cost_of_the_order | float64 | +-----------------------+---------+ | day_of_the_week | object | +-----------------------+---------+ | rating | object | +-----------------------+---------+ | food_preparation_time | int64 | +-----------------------+---------+ | delivery_time | int64 | +-----------------------+---------+
🔍 Observation:
🔧 Actionable Insights
Although the 'rating' column is currently of type
object, it should ideally be of typeint64, as ratings are generally ordinal in nature. This will be addressed later (during data cleaning/preprocessing step).
# ? As suggested in question, so using the `info()` as well
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1898 entries, 0 to 1897 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 1898 non-null int64 1 customer_id 1898 non-null int64 2 restaurant_name 1898 non-null object 3 cuisine_type 1898 non-null object 4 cost_of_the_order 1898 non-null float64 5 day_of_the_week 1898 non-null object 6 rating 1898 non-null object 7 food_preparation_time 1898 non-null int64 8 delivery_time 1898 non-null int64 dtypes: float64(1), int64(4), object(4) memory usage: 133.6+ KB
has_missing_values = df.isnull().any().any()
has_missing_values
False
🔍 Observation:
There are no explicit missing values in the dataset. However, the rating column contains entries labeled as "Not given" which indicate the absence of a customer rating. These entries should be treated as missing values during analysis to ensure accurate insights.
stat_summary = df.describe(include='all').T
stat_summary
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| order_id | 1898.0 | NaN | NaN | NaN | 1477495.5 | 548.049724 | 1476547.0 | 1477021.25 | 1477495.5 | 1477969.75 | 1478444.0 |
| customer_id | 1898.0 | NaN | NaN | NaN | 171168.478398 | 113698.139743 | 1311.0 | 77787.75 | 128600.0 | 270525.0 | 405334.0 |
| restaurant_name | 1898 | 178 | Shake Shack | 219 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| cuisine_type | 1898 | 14 | American | 584 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| cost_of_the_order | 1898.0 | NaN | NaN | NaN | 16.498851 | 7.483812 | 4.47 | 12.08 | 14.14 | 22.2975 | 35.41 |
| day_of_the_week | 1898 | 2 | Weekend | 1351 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| rating | 1898 | 4 | Not given | 736 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| food_preparation_time | 1898.0 | NaN | NaN | NaN | 27.37197 | 4.632481 | 20.0 | 23.0 | 27.0 | 31.0 | 35.0 |
| delivery_time | 1898.0 | NaN | NaN | NaN | 24.161749 | 4.972637 | 15.0 | 20.0 | 25.0 | 28.0 | 33.0 |
🔍 Observation:
# statistical summary for the 'food_preparation_time' column
food_preparation_summary = df['food_preparation_time'].describe()
food_preparation_summary
count 1898.000000 mean 27.371970 std 4.632481 min 20.000000 25% 23.000000 50% 27.000000 75% 31.000000 max 35.000000 Name: food_preparation_time, dtype: float64
❓ Question
What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed ?
=>
The minimum time for food preparation is 20 minutes, the average time is 27.37 minutes, and the maximum time is 35 minutes
📌 Points
📝 Gist
On average, restaurants take about 27 minutes to prepare food, with most orders falling between 23 and 31 minutes. While the fastest orders are ready in 20 minutes, some take up to 35 minutes. The slight variability in preparation time (around 4 minutes) suggests a consistent but slightly flexible pace across restaurants.
df['rating'].value_counts()
Not given 736 5 588 4 386 3 188 Name: rating, dtype: int64
✅ Answer
There are 736 orders that are not rated, as indicated by the "Not given" entries in the rating column.
⚡ Note
There are no missing values in the 'rating' column, so these entries clearly represent the absence of a rating & hence
value_counts()can be reliable to answer the above questionIf that's was not the case, then we may need to determine it with different approach
# GENERIC WAY :- An alternate way to find missing values
# NOTE: this will be helpful especially when there are more than 1 rating values which are not a number
# (ie for eg 'Not Given', 'Not rated', 'NA', etc)
# In present case, there is only 1 value 'Not rated',
# so it was straight forward to determine just via seeing stats-info from `value_counts()`
# find non-numeric values in the 'rating' column by converting the column to numeric & checking for NaN values
non_numeric_ratings_count = pd.to_numeric(df['rating'], errors='coerce').isnull().sum()
print(f"Number of non-numeric values in the 'rating' column: {non_numeric_ratings_count}")
Number of non-numeric values in the 'rating' column: 736
🔍 Observation
The 'rating' column contains 736 entries marked as 'Not given,' indicating that these orders were not rated by customers.
# ! NOTE: only attribute which are used further in some sort of computation/manipulation, is addressed here !!
# 1.
# NOTE: the modification is done inplace, as original df backup us already there i.e `original_df`
# convert the 'rating' column to a numeric type and replace 'Not given' with NaN for consistency.
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
# Verify the changes
df['rating'].dtype
dtype('float64')
distinct_ratings = df['rating'].nunique()
distinct_ratings
3
df['rating'].value_counts()
5.0 588 4.0 386 3.0 188 Name: rating, dtype: int64
# get non-numeric values count in the new 'rating' column
new_val = df['rating'].isnull().sum()
prev_val = non_numeric_ratings_count
print(f"For verification (non-numeric count): {new_val} (previously {prev_val}), is_same = {new_val == prev_val}")
For verification (non-numeric count): 736 (previously 736), is_same = True
df['rating'].info()
<class 'pandas.core.series.Series'> RangeIndex: 1898 entries, 0 to 1897 Series name: rating Non-Null Count Dtype -------------- ----- 1162 non-null float64 dtypes: float64(1) memory usage: 15.0 KB
# !! Assumption : Before this computation, ratings column should be converted to numeric type (ie Imputed with NaN for 'Not given' values)
# Check:-
# Top 5 restaurants with the highest average rating
top_5_restaurant_with_highest_ratings = df.groupby('restaurant_name')['rating'].mean().nlargest(5)
tableit(top_5_restaurant_with_highest_ratings, index_label='Restaurant', value_label='Average Rating')
+--------------------+------------------+ | Restaurant | Average Rating | +====================+==================+ | 'wichcraft | 5 | +--------------------+------------------+ | 67 Burger | 5 | +--------------------+------------------+ | Anjappar Chettinad | 5 | +--------------------+------------------+ | Asuka Sushi | 5 | +--------------------+------------------+ | Balade | 5 | +--------------------+------------------+
class UnivariatePlotHelper:
def __init__(self, df: pd.DataFrame):
self.df = df
def draw_countplot(
self,
attribute: str,
*,
label=None,
rot=0,
order=None,
sort=True,
palette=None,
showgrid=None,
):
"""
sort: will be used only if order is None
"""
# prep (meta) --
xlabel = label if label else snake_to_pascal(attribute)
priority = None
if order is not None:
priority = order
elif sort:
# sort by count
# NOTE: value_counts() returns a Series sorted by values (in DESC order by default)
priority = df[attribute].value_counts().index
# plot (crux) --
ax = sns.countplot(data=df, x=attribute, order=priority, palette=palette)
# display count above each bar
ax.bar_label(ax.containers[0])
# aesthetics --
plt.title(f"Bar Graph of {xlabel}")
plt.xlabel(xlabel)
plt.ylabel("count")
plt.xticks(rotation=rot)
if showgrid:
plt.grid(True)
plt.show()
def draw_boxplot(self, attribute: str, *, label=None):
"""
attribute: name of the column
"""
# prep
xlabel = label if label else snake_to_pascal(attribute)
# plot
sns.boxplot(data=df, x=attribute)
# aesthetics
plt.title(f"Box Plot of {xlabel}")
plt.xlabel(xlabel)
plt.show()
def draw_boxplot_fromseries(self, series: pd.Series, *, label=None):
"""
series: array of numeric values
"""
# prep
xlabel = label if label else snake_to_pascal(series.name)
# plot
sns.boxplot(x=series)
# aesthetics
plt.title(f"Box Plot of {xlabel}")
plt.xlabel(xlabel)
plt.show()
def draw_histplot(self, attribute: str, *, bins="auto", label=None, color=None):
# prep
xlabel = label if label else snake_to_pascal(attribute)
# plot
sns.histplot(data=df, x=attribute, bins=bins, kde=True, color=color)
# aesthetics
plt.title(f"Histogram of {xlabel}")
plt.xlabel(xlabel)
plt.ylabel("frequeuncy")
plt.show()
def draw_pieplot(self, attribute: str, *, label=None):
# prep
label = label if label else snake_to_pascal(attribute)
# plot
counts = df[attribute].value_counts()
# aesthetics
plt.title(f"Pie Chart of {label}", pad=20)
plt.pie(counts, labels=counts.index, autopct="%1.1f%%", startangle=140)
plt.axis("equal")
plt.show()
# helpers (init | singleton)
plot_helper = UnivariatePlotHelper(df)
df.columns.to_list()
['order_id', 'customer_id', 'restaurant_name', 'cuisine_type', 'cost_of_the_order', 'day_of_the_week', 'rating', 'food_preparation_time', 'delivery_time']
# total orders
total_orders = df.shape[0]
total_customers = df['customer_id'].nunique()
print(f'Total Orders: {total_orders} & Total Customers: {total_customers}')
Total Orders: 1898 & Total Customers: 1200
💡 Note: Since order_id and customer_id are nominal in nature, used primarily for identification, they are excluded from further analysis.
# Descriptive stats
restaurant_order_counts = df['restaurant_name'].value_counts()
restaurant_order_counts
Shake Shack 219
The Meatball Shop 132
Blue Ribbon Sushi 119
Blue Ribbon Fried Chicken 96
Parm 68
...
Sushi Choshi 1
Dos Caminos Soho 1
La Follia 1
Philippe Chow 1
'wichcraft 1
Name: restaurant_name, Length: 178, dtype: int64
# PLOT
# as there are too many categories, so setting bit unconventional figuresize
set_fig_size(80, 20)
# restaurant names -> nominal variable | categorical
plot_helper.draw_countplot('restaurant_name', rot=90)
💡 Note:
As the restaurant_name is nominal in nature, this plot is hard to visualize. Hence let try to pick some priority restaurants and plot the graph again.
top_n = 10
# Top 10 restaurants with the highest demand (ie maximum orders)
top_restaurants_order_count = df['restaurant_name'].value_counts().nlargest(top_n)
top_restaurants_names = top_restaurants_order_count.index.to_list()
print(f'Top {top_n} restaurants with the highest demand (ie maximum orders):')
print(top_restaurants_names)
Top 10 restaurants with the highest demand (ie maximum orders): ['Shake Shack', 'The Meatball Shop', 'Blue Ribbon Sushi', 'Blue Ribbon Fried Chicken', 'Parm', 'RedFarm Broadway', 'RedFarm Hudson', 'TAO', 'Han Dynasty', 'Blue Ribbon Sushi Bar & Grill']
tableit(top_restaurants_order_count, value_label='#orders')
+-------------------------------+-----------+ | restaurant_name | #orders | +===============================+===========+ | Shake Shack | 219 | +-------------------------------+-----------+ | The Meatball Shop | 132 | +-------------------------------+-----------+ | Blue Ribbon Sushi | 119 | +-------------------------------+-----------+ | Blue Ribbon Fried Chicken | 96 | +-------------------------------+-----------+ | Parm | 68 | +-------------------------------+-----------+ | RedFarm Broadway | 59 | +-------------------------------+-----------+ | RedFarm Hudson | 55 | +-------------------------------+-----------+ | TAO | 49 | +-------------------------------+-----------+ | Han Dynasty | 46 | +-------------------------------+-----------+ | Blue Ribbon Sushi Bar & Grill | 44 | +-------------------------------+-----------+
plot_helper.draw_countplot('restaurant_name', label=f'Top {top_n} Restaurants', rot=30, order=top_restaurants_names)
# less demanding restaurants order counts
least_n = 10
bottom_restaurants_order_count = df['restaurant_name'].value_counts().nsmallest(least_n)
bottom_restaurants_names = bottom_restaurants_order_count.index.to_list()
print(f'The {least_n} restaurants with the least demand (ie fewest orders):')
print(bottom_restaurants_names)
The 10 restaurants with the least demand (ie fewest orders): ['Le Zie 2000 Trattoria', 'Cafe de La Esquina', "Sarabeth's West", 'Frank Restaurant', "Hiroko's Place", 'Rye House', 'Lamarca Pasta', '67 Burger', 'Lucky Strike', "Paul & Jimmy's"]
plot_helper.draw_countplot('restaurant_name', label=f'Bottom {least_n} Restaurants', rot=30, order=bottom_restaurants_names)
🔍 Insights :
All the bottom restaurants have only 1 order
# Descriptive stats
restaurant_order_counts.describe()
count 178.000000 mean 10.662921 std 24.047008 min 1.000000 25% 1.000000 50% 3.000000 75% 8.750000 max 219.000000 Name: restaurant_name, dtype: float64
🧠 Idea : Let try to reckon restaurant order demand summary
# PLOT
plot_helper.draw_boxplot_fromseries(restaurant_order_counts)
# STATS
q1, q3, _ = calc_iqr(restaurant_order_counts)
q1, q3
(1.0, 8.75)
# Explore | Stats
# restaurants with more than 10
restaurant_names_with_more_than_10_orders = restaurant_order_counts[restaurant_order_counts > 10].index
# get total orders count from all such restaurants
filtered_df = df[df['restaurant_name'].isin(restaurant_names_with_more_than_10_orders)]
filtered_df.shape
(1494, 9)
df.shape
(1898, 9)
diff = df.shape[0] - filtered_df.shape[0]
diff_pct = diff / df.shape[0] * 100
print(f"Total orders from restaurants with more than 10 orders: {diff} (ie {format_pct(diff_pct)}%)")
Total orders from restaurants with more than 10 orders: 404 (ie 21.29%%)
🔍 Insights
Only 400 (ie approx 20%) restaurants have more than 10 orders
# categorical
x = 'cuisine_type'
plot_helper.draw_countplot(x, rot=45)
📌 Points
The distribution indicates a higher preference for certain cuisines among customers, Hence restaurant should focus on such cuisines to attract more customers.
# helper cell
# : to find next column name
df.columns
Index(['order_id', 'customer_id', 'restaurant_name', 'cuisine_type',
'cost_of_the_order', 'day_of_the_week', 'rating',
'food_preparation_time', 'delivery_time'],
dtype='object')
x = 'cost_of_the_order'
df[x].describe()
count 1898.000000 mean 16.498851 std 7.483812 min 4.470000 25% 12.080000 50% 14.140000 75% 22.297500 max 35.410000 Name: cost_of_the_order, dtype: float64
# Plot
plot_helper.draw_histplot(x)
📌 Points :
# Descriptive stats
q1, q3, iqr = calc_iqr(df[x])
q1, q3, iqr
(12.08, 22.2975, 10.2175)
🔍 Insights:
The IQR for the cost of the order is 10.21 units, i.e Half of all the orders cost somewhere between 10 units more or less than typical usual price, showing a moderate variation in how much customers are spending.
# Plot
plot_helper.draw_boxplot(x)
📌 Points:
# Stats | More
# deeper insights
print(f"skewness: {df['cost_of_the_order'].skew()}")
print(f"kurtosis: {df['cost_of_the_order'].kurt()}")
skewness: 0.6217469995871637 kurtosis: -0.6078508734318953
🔍 Insights:
The cost of orders has a slight right skew (skewness of 0.6), meaning there are a few higher-cost orders compared to the majority. The negative kurtosis value (-0.6) indicates that the distribution has lighter tails and is flatter than a normal distribution, suggesting fewer extreme values (outliers) in the cost data.
🔍 Observation:
The boxplot for 'Cost of the Order' reveals that the median order costs about 15 units, with most orders falling between 12 to 23 dollars. This means that half of the orders are within this price range. However, there are a few orders that are much more expensive, likely due to larger or multiple items being ordered. This shows that while most people are placing affordable orders, some orders can be significantly higher in cost.
# helper cell
df.columns
Index(['order_id', 'customer_id', 'restaurant_name', 'cuisine_type',
'cost_of_the_order', 'day_of_the_week', 'rating',
'food_preparation_time', 'delivery_time'],
dtype='object')
# categorical
x = 'day_of_the_week'
plot_helper.draw_countplot(x)
# Descriptive Statistics
order_count_by_day_category = df['day_of_the_week'].value_counts()
order_count_by_day_category_pct = df['day_of_the_week'].value_counts(normalize=True) * 100
print('Order Counts by Day of the Week:', order_count_by_day_category, sep='\n')
print('\nPercentage of Orders by Day of the Week:', order_count_by_day_category_pct, sep='\n')
Order Counts by Day of the Week: Weekend 1351 Weekday 547 Name: day_of_the_week, dtype: int64 Percentage of Orders by Day of the Week: Weekend 71.18019 Weekday 28.81981 Name: day_of_the_week, dtype: float64
plot_helper.draw_pieplot(x)
📌 Points :
🔍 Observation:
There is higher demand for food orders during weekends compared to weekdays. Restaurants may consider increasing their staff and resources during weekends to handle the higher volume of orders.
# helper cell
df.columns
Index(['order_id', 'customer_id', 'restaurant_name', 'cuisine_type',
'cost_of_the_order', 'day_of_the_week', 'rating',
'food_preparation_time', 'delivery_time'],
dtype='object')
# Categorical | Ordinal
x = 'rating'
df[x].dtype
dtype('float64')
# Note; here null is same as "Not given" as we have already converted "Not given" to NaN
ratings_not_given = df[x].isnull().sum()
ratings_not_given
736
ratings_not_given_pct = ratings_not_given / total_orders * 100
print(f"Percentage of orders wihtout rating: {format_pct(ratings_not_given_pct)}")
Percentage of orders wihtout rating: 38.78%
🔍 Insights:
A significant portion of the ratings are 'Not given', indicating that many customers did not provide a rating.
NOTE: Plots addresses orders with rating
# Plot
plot_helper.draw_pieplot(x)
📌 Points:
# Stats
tableit(df[x].value_counts(), index_label='Rating', value_label='#Orders')
+----------+-----------+ | Rating | #Orders | +==========+===========+ | 5 | 588 | +----------+-----------+ | 4 | 386 | +----------+-----------+ | 3 | 188 | +----------+-----------+
plot_helper.draw_countplot(x)
🔍 Observation:
This suggests that customers who do leave ratings generally tend to give high scores. The large number of 'Not given' ratings highlights the potential opportunity for the company to encourage more customers to share their feedback.
# helper cells
df.columns
Index(['order_id', 'customer_id', 'restaurant_name', 'cuisine_type',
'cost_of_the_order', 'day_of_the_week', 'rating',
'food_preparation_time', 'delivery_time'],
dtype='object')
x = 'food_preparation_time'
# Stats
df[x].describe()
count 1898.000000 mean 27.371970 std 4.632481 min 20.000000 25% 23.000000 50% 27.000000 75% 31.000000 max 35.000000 Name: food_preparation_time, dtype: float64
df[x].nunique()
16
🔍 Insights:
There are only 16 unique food preparation time (minutes), indicating it to be as discrete in nature
# explore | stats
food_preparation_time_counts = df[x].value_counts()
food_preparation_time_counts_pct = df[x].value_counts(normalize=True) * 100
food_preparation_time_counts
21 135 23 123 27 123 22 123 28 121 24 121 20 119 30 119 33 118 35 117 31 116 26 115 25 113 34 113 32 113 29 109 Name: food_preparation_time, dtype: int64
food_preparation_time_counts_pct
21 7.112750 23 6.480506 27 6.480506 22 6.480506 28 6.375132 24 6.375132 20 6.269758 30 6.269758 33 6.217071 35 6.164384 31 6.111697 26 6.059009 25 5.953635 34 5.953635 32 5.953635 29 5.742887 Name: food_preparation_time, dtype: float64
plot_helper.draw_histplot(x)
🔍 Insights:
The histogram of food preparation time appears to be plateaued, with a consistent distribution across the range, indicating little variation in preparation times
plot_helper.draw_countplot(x, sort=False)
plot_helper.draw_pieplot(x)
🔍 Insights:
The pie chart shows that the most common food preparation time is 21 minutes, making up 7.1% of all orders.
plot_helper.draw_boxplot(x)
🔍 Observation:
The 'Food Preparation Time' column contains only 16 unique values, meaning that the time it takes for restaurants to prepare orders is fairly consistent. This could suggest that most restaurants have similar preparation times for their orders, or they may follow a standard process for preparing meals. It also indicates that there aren't many variations in how long it takes to prepare food across different orders.
The distribution of 'Food Preparation Time' shows that most orders are prepared within a time frame of 23 to 31 minutes.
df.columns
Index(['order_id', 'customer_id', 'restaurant_name', 'cuisine_type',
'cost_of_the_order', 'day_of_the_week', 'rating',
'food_preparation_time', 'delivery_time'],
dtype='object')
# numeric
x = 'delivery_time'
df[x].describe()
count 1898.000000 mean 24.161749 std 4.972637 min 15.000000 25% 20.000000 50% 25.000000 75% 28.000000 max 33.000000 Name: delivery_time, dtype: float64
plot_helper.draw_histplot(x)
🔍 Insights:
The histogram of delivery time is slightly left-skewed, with higher frequencies observed for longer delivery times on the right side
# Stats
# frequency of each delivery time
delivery_time_counts = df[x].value_counts()
delivery_time_counts
24 162 28 148 29 148 26 141 27 138 30 133 25 120 19 90 16 90 20 88 15 87 22 85 18 83 21 81 17 78 23 77 32 59 33 49 31 41 Name: delivery_time, dtype: int64
plot_helper.draw_countplot(x, sort=False)
plot_helper.draw_boxplot(x)
df[x].describe()
count 1898.000000 mean 24.161749 std 4.972637 min 15.000000 25% 20.000000 50% 25.000000 75% 28.000000 max 33.000000 Name: delivery_time, dtype: float64
📌 Points:
🔍 Observation:
On an average, deliveries take about 24 minutes, with most orders falling between 20 and 28 minutes. While the fastest deliveries are completed in 15 minutes, some take up to 33 minutes. Moreover, Half of all orders are delivered within 25 minutes.
The slight variability in delivery time (around 5 minutes) suggests a consistent but slightly flexible pace across delivery services
# Q7.
k = 5
# NOTE: The resulting object will be in descending order so that the first element is the most frequently-occurring element.
top_5_restaurant = df['restaurant_name'].value_counts().head(5)
top_5_restaurant
Shake Shack 219 The Meatball Shop 132 Blue Ribbon Sushi 119 Blue Ribbon Fried Chicken 96 Parm 68 Name: restaurant_name, dtype: int64
tableit(top_5_restaurant, index_label='Restaurant', value_label='Order Count')
+---------------------------+---------------+ | Restaurant | Order Count | +===========================+===============+ | Shake Shack | 219 | +---------------------------+---------------+ | The Meatball Shop | 132 | +---------------------------+---------------+ | Blue Ribbon Sushi | 119 | +---------------------------+---------------+ | Blue Ribbon Fried Chicken | 96 | +---------------------------+---------------+ | Parm | 68 | +---------------------------+---------------+
🔍 Observation:
The top 5 restaurants in terms of the number of orders received are:
weekend_orders = df[df['day_of_the_week'] == 'Weekend']
cuisine_order_count = weekend_orders['cuisine_type'].value_counts()
cuisine_order_count
American 415 Japanese 335 Italian 207 Chinese 163 Mexican 53 Indian 49 Mediterranean 32 Middle Eastern 32 Thai 15 French 13 Korean 11 Southern 11 Spanish 11 Vietnamese 4 Name: cuisine_type, dtype: int64
# most popular cuisine and its percentage
most_popular_cuisine = cuisine_order_count.idxmax()
percentage = (cuisine_order_count.max() / cuisine_order_count.sum()) * 100
print(f"The most popular cuisine on weekends is {most_popular_cuisine} with {format_pct(percentage)} of weekend orders.")
The most popular cuisine on weekends is American with 30.72% of weekend orders.
🔍 Observation:
The most popular cuisine on weekends is American with a total of 415 orders, constituting around 31% of weekend orders. This indicates a strong preference for American cuisine among customers during weekends.
Restaurants offering American cuisine may experience higher demand on weekends and should be prepared to handle the increased volume of orders.
# mask for orders with cost greater than 20
mask_cost_above_20 = df['cost_of_the_order'] > 20
percent_of_order_above_20 = mask_cost_above_20.mean() * 100
print(f"Percentage of orders with a cost above 20 dollar is {format_pct(percent_of_order_above_20)}")
Percentage of orders with a cost above 20 dollar is 29.24%
df['cost_of_the_order'].describe()
count 1898.000000 mean 16.498851 std 7.483812 min 4.470000 25% 12.080000 50% 14.140000 75% 22.297500 max 35.410000 Name: cost_of_the_order, dtype: float64
🔍 Observation:
avg_delivery_time = df['delivery_time'].mean()
print(f"The average delivery time is {avg_delivery_time:.2f} minutes.")
The average delivery time is 24.16 minutes.
the mean order delivery time is 24.16 minutes
# Count the number of orders for each customer
order_counts = df['customer_id'].value_counts()
# Get the top 3 customers
top_3_customers = order_counts.head(3)
top_3_customers
52832 13 47440 10 83287 9 Name: customer_id, dtype: int64
tableit(top_3_customers, index_label='Customer ID', value_label='Order Count')
+---------------+---------------+ | Customer ID | Order Count | +===============+===============+ | 52832 | 13 | +---------------+---------------+ | 47440 | 10 | +---------------+---------------+ | 83287 | 9 | +---------------+---------------+
🔍 Observation:
The top 3 most frequent customers based on the number of orders placed are:
The top 3 customers with the most orders, suggesting a small group of loyal customers place multiple orders. This indicates potential for targeted loyalty programs or special offers to further encourage repeat business from these high-value customers.
🧠 Explore (New Insight):
Customer Repeat
# Explore more (Customer Repeat)
order_counts.describe()
count 1200.000000 mean 1.581667 std 1.090560 min 1.000000 25% 1.000000 50% 1.000000 75% 2.000000 max 13.000000 Name: customer_id, dtype: float64
order_counts[order_counts > 2].count()
149
149 / 1200 * 100
12.416666666666666
🔍 Insights:
There are very few repeating customers as most of the customers placed order for once only. However, there is a small group of customers who place multiple orders, with the highest order count reaching 13.
Since most customers only place one order, there is an opportunity to boost customer retention through loyalty programs or promotions aimed at encouraging repeat orders. Targeting the small group of high-frequency customers with exclusive offers could further strengthen their loyalty and increase their order frequency.
(This can help identify popularity of cuisines among weekdays and weekends)
x = 'cuisine_type'
y = 'day_of_the_week'
df[x].nunique(), df[y].nunique()
(14, 2)
# Plot
heatmap_data = pd.crosstab(df['cuisine_type'], df['day_of_the_week'])
sns.heatmap(heatmap_data, annot=True, fmt="d", cmap="Blues", cbar=True, linewidths=0.5)
# aesthetics
plt.title("Cuisine Type vs Day of the Week")
plt.xlabel("Day of the Week")
plt.ylabel("Cuisine Type")
plt.xticks(rotation=45)
plt.show()
📌 Points:
The heatmap shows the relationship between cuisine type and the day of the week (Weekday vs Weekend).
Overall, the heatmap indicates that weekends generally see a higher volume of orders across most cuisine types, with American and Japanese cuisines being the most popular.
# Plot 2 (Stacked Bar Chart)
# Calculate the count of orders for each cuisine_type and day_of_the_week
order_counts = pd.crosstab(df['cuisine_type'], df['day_of_the_week'])
# Normalize to get percentages
order_percentages = order_counts.div(order_counts.sum(axis=1), axis=0) * 100
# Plot the stacked bar plot
order_percentages.plot(kind='bar', stacked=True)
# Customize the plot
plt.title('Percentage of Orders by Cuisine Type and Day of the Week')
plt.xlabel('Cuisine Type')
plt.ylabel('Percentage of Orders')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Day of the Week', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
🔍 Insights:
Vietnamese cuisine is the only cuisine that shows a relatively balanced distribution, with 40% of orders placed on weekdays and 60% on weekends. Unlike other cuisines, which have a greater concentration of orders on weekends (below 40% on weekdays), Vietnamese cuisine maintains a more even split across the week.
# Plot 3
# Calculate the count of orders for each cuisine_type and day_of_the_week
order_counts = pd.crosstab(df['cuisine_type'], df['day_of_the_week'])
# Plot the grouped bar plot
order_counts.plot(kind='bar', stacked=False)
# Customize the plot
plt.title('#Orders by Cuisine Type and Day of the Week')
plt.xlabel('Cuisine Type')
plt.ylabel('Orders Count')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Day of the Week', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
🔍 Insights:
Italian and Chinese cuisines rank just after Japanese and American cuisines in terms of orders, showing a strong customer preference for these options as well.
The business could leverage this insight by offering combo deals or promotions that highlight Italian and Chinese dishes, especially during peak ordering times, to further drive sales. This could also help in balancing the demand across different cuisines and maintaining customer interest.
🔍 Observation: (todo:)
The data indicates a general trend of higher demand for various cuisines during weekends, suggesting that customers are more likely to order food during weekends.
Restaurants may consider increasing their inventory and staffing on weekends to accommodate the higher volume of orders.
(This can provide insights into whether customers tend to rate more expensive orders differently than cheaper ones)
# Plot 1
sns.violinplot(data=df, x='rating', y='cost_of_the_order')
plt.title('Distribution of Order Costs by Customer Ratings')
plt.xlabel('Rating')
plt.ylabel('Cost of the Order')
plt.show()
avg_cost_by_rating = original_df.groupby('rating')['cost_of_the_order'].mean().reset_index()
avg_cost_by_rating
| rating | cost_of_the_order | |
|---|---|---|
| 0 | 3 | 16.223351 |
| 1 | 4 | 16.710337 |
| 2 | 5 | 16.965697 |
| 3 | Not given | 16.085340 |
# Plot 2 (Count)
# create a bar plot to show the average cost for each rating
ax = sns.barplot(data=avg_cost_by_rating, x='rating', y='cost_of_the_order')
# add count above each bar
ax.bar_label(ax.containers[0])
# Add title and labels
plt.title('Average Cost of the Order by Rating')
plt.xlabel('Rating')
plt.ylabel('Average Cost of the Order')
# Show the plot
plt.show()
🔍 Observation:
This suggests that cost of an order does not vary significantly based on the rating given by the customer. In other words, whether the customer gives a high or low rating, the average amount spent on orders remains relatively consistent. This could further imply that pricing of the orders is independent of customer satisfaction, or there may be a broad range of pricing within each rating group
The average cost appears to be almost identical across all ratings. Additionally, event when ratings were not provided, the average cost remains consistent. This indicates cost is independent of customer feedback.
💡 Recommendations :
(This can help identify peak days with longer delivery times and optimize operations for faster service)
x = 'day_of_the_week'
y = 'delivery_time'
df[x].nunique(), df[y].nunique()
(2, 19)
# Plot 1
# box plot between day of the week and delivery time
sns.boxplot(data=df, x=x, y=y)
# add title and labels
plt.title('Delivery Time Comparision: Weekdays vs Weekends')
plt.xlabel('Day of the Week')
plt.ylabel('Delivery Time (mins)')
# show the plot
plt.show()
🔍 Insights:
This is quite interesting, as weekend observes more orders compare to weekday, still orders are delivered in lesser time during weekend. This suggests that there may be some other factors affecting the delivery time during weekday (maybe traffic of New-York)
# calculate median time for each day of the week
median_delivery_time = df.groupby('day_of_the_week')['delivery_time'].median().reset_index()
median_delivery_time
| day_of_the_week | delivery_time | |
|---|---|---|
| 0 | Weekday | 28.0 |
| 1 | Weekend | 22.0 |
# Plot 2
# Strip plot for day_of_the_week vs delivery_time
sns.stripplot(x='day_of_the_week', y='delivery_time', data=df, jitter=True)
# Adding labels and title
plt.title('Delivery Time Comparision: Weekdays vs Weekends')
plt.xlabel('Day of the Week')
plt.ylabel('Delivery Time')
# Show plot
plt.show()
🔍 Observation:
Delivery times on weekdays tend to be higher and more spread out compared to weekends, where they are generally faster and more consistent. This could suggest that weekend orders may be handled more efficiently or have less traffic, while weekdays could experience more delays or variability, possibly due to increased order volume or delivery congestion.
(This will help us understand if higher or lower ratings are associated with faster or slower delivery times.)
x = 'rating'
y = 'delivery_time'
df[x].nunique(), df[y].nunique()
(3, 19)
# Plot 1
# Drop rows with NaN values in the 'rating' column (for this pair analysis)
df_clean = df.dropna(subset=['rating'])
sns.swarmplot(data=df_clean, x='rating', y='delivery_time', alpha=0.7)
plt.title('Customer Rating vs Delivery Time')
plt.xlabel('Rating')
plt.ylabel('Delivery Time (minutes)')
plt.show()
🔍 Insights:
Here we get good idea about density of ratings around particular Delivery Time (minutes) value. Because we use swarm plot we will get better idea about width (ie spread) of ratings distribution acrosss each delivery time stamp
💡 Quick Observation :
Most of orders are delivered in 20 to 30 mins, regardless of ratings
# plot 2
sns.violinplot(data=df, x='rating', y='delivery_time')
plt.title('Customer Rating vs Delivery Time')
plt.xlabel('Rating')
plt.ylabel('Delivery Time (minutes)')
plt.show()
🔍 Insights:
The distribution of delivery times for rating 5 resembles to a large extent with a violin like shape, comparatively amongst 3, indicating a balanced spread of values.
sns.boxplot(data=df, x='rating', y='delivery_time')
plt.title('Customer Rating vs Delivery Time')
plt.xlabel('Rating')
plt.ylabel('Delivery Time (minutes)')
plt.show()
📌 Points:
The Interquartile Range (IQR) for all rating categories is nearly the same when considering delivery time
# stats
avg_delivery_time_by_rating = df.groupby('rating')['delivery_time'].mean().reset_index()
avg_delivery_time_by_rating
| rating | delivery_time | |
|---|---|---|
| 0 | 3.0 | 24.558511 |
| 1 | 4.0 | 23.867876 |
| 2 | 5.0 | 24.212585 |
🔍 Observation:
This finding might indicate that delivery time is not a major driver of the variance in customer ratings, or that other factors (such as food quality, customer service, etc.) might play a more substantial role in influencing ratings & customer satisfaction.
Whether the delivery time is slightly longer or shorter, customers tend to give similar ratings.
(This can help derive strategic insights for pricing, promotions, and even menu optimization based on customer preferences and spending behavior.)
sns.boxplot(data=df, x='cuisine_type', y='cost_of_the_order')
plt.title('Cost of the Order vs Cuisine Type')
plt.xlabel('Cuisine Type')
plt.ylabel('Cost of the Order')
plt.xticks(rotation=45)
plt.show()
📌 Points:
# create bar plot
sns.barplot(data=df, x='cuisine_type', y='cost_of_the_order')
# title and labels
plt.title('Average Cost of the Order by Cuisine Type')
plt.xlabel('Cuisine Type')
plt.ylabel('Average Cost')
# Rotate x-axis labels
plt.xticks(rotation=45)
# show the plot
plt.show()
# Stats
avg_cost_by_cuisine = df.groupby('cuisine_type')['cost_of_the_order'].mean().reset_index()
avg_cost_by_cuisine
| cuisine_type | cost_of_the_order | |
|---|---|---|
| 0 | American | 16.319829 |
| 1 | Chinese | 16.305209 |
| 2 | French | 19.793889 |
| 3 | Indian | 16.919726 |
| 4 | Italian | 16.418691 |
| 5 | Japanese | 16.304532 |
| 6 | Korean | 14.001538 |
| 7 | Mediterranean | 15.474783 |
| 8 | Mexican | 16.933117 |
| 9 | Middle Eastern | 18.820612 |
| 10 | Southern | 19.300588 |
| 11 | Spanish | 18.994167 |
| 12 | Thai | 19.207895 |
| 13 | Vietnamese | 12.882857 |
🔍 Observation:
Most of the cuisines have an average cost between 14 and 19 dollars.
The variation in average costs suggests that certain cuisines may be priced higher due to factors such as ingredient costs, preparation complexity, or restaurant pricing strategies.
(This may help to discover what are specialities of certain restaurants)
x = 'restaurant_name'
y = 'cuisine_type'
df[x].nunique(), df[y].nunique()
(178, 14)
NOTE: As there are too many restaurants, we will pick top performing restaurants (ie which recieves more orders comparatively)
# PREP (Data Preparation)
# Filter for top 10 restaurants by order count
famous_restaurants = df['restaurant_name'].value_counts().head(10).index
filtered_df = df[df['restaurant_name'].isin(famous_restaurants)]
# hence we will deal with filtered_df
filtered_df.head()
| order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
|---|---|---|---|---|---|---|---|---|---|
| 3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3.0 | 25 | 15 |
| 6 | 1477894 | 157711 | The Meatball Shop | Italian | 6.07 | Weekend | NaN | 28 | 21 |
| 12 | 1476966 | 129969 | Blue Ribbon Fried Chicken | American | 24.30 | Weekend | 5.0 | 23 | 17 |
| 15 | 1477414 | 66222 | Shake Shack | American | 16.20 | Weekend | 5.0 | 33 | 25 |
| 18 | 1477138 | 47280 | RedFarm Hudson | Chinese | 14.12 | Weekend | 4.0 | 25 | 24 |
# Plot 1
sns.countplot(data=filtered_df, x='restaurant_name', hue='cuisine_type')
plt.title('Cuisine Types for Top 10 Restaurants')
plt.xlabel('Restaurant Name')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.legend(title='Cuisine Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
📌 Points:
It seems that Most of the restaurant possess their own area of speciality or expertise for specific cuisine.
Chinese cuisines also seems to hold good market. & There is healthy competition for chinese cuisine between RedFarm Hudson, RedFarm Broadway, & Han Dynasty restaurants.
# Stats
# get order count for restaurant by cuisine type
restaurant_cuisine_count = filtered_df.groupby(['restaurant_name', 'cuisine_type']).size().unstack().fillna(0)
restaurant_cuisine_count
| cuisine_type | American | Chinese | Italian | Japanese |
|---|---|---|---|---|
| restaurant_name | ||||
| Blue Ribbon Fried Chicken | 96.0 | 0.0 | 0.0 | 0.0 |
| Blue Ribbon Sushi | 0.0 | 0.0 | 0.0 | 119.0 |
| Blue Ribbon Sushi Bar & Grill | 0.0 | 0.0 | 0.0 | 44.0 |
| Han Dynasty | 0.0 | 46.0 | 0.0 | 0.0 |
| Parm | 0.0 | 0.0 | 68.0 | 0.0 |
| RedFarm Broadway | 0.0 | 59.0 | 0.0 | 0.0 |
| RedFarm Hudson | 0.0 | 55.0 | 0.0 | 0.0 |
| Shake Shack | 219.0 | 0.0 | 0.0 | 0.0 |
| TAO | 0.0 | 0.0 | 0.0 | 49.0 |
| The Meatball Shop | 20.0 | 0.0 | 112.0 | 0.0 |
# Plot 2 (Pivot Table)
# Create a pivot table
pivot_table = df.pivot_table(index='restaurant_name', columns='cuisine_type', aggfunc='size', fill_value=0)
# Filter for top 10 restaurants
k = 10
top_restaurant_pivot = pivot_table.loc[pivot_table.sum(axis=1).nlargest(k).index]
sns.heatmap(top_restaurant_pivot, annot=True, fmt='d', cmap='Blues', cbar=True)
plt.title(f'Cuisine Type Distribution Across Top {k} Restaurants')
plt.xlabel('Cuisine Type')
plt.ylabel('Restaurant Name')
plt.show()
🔍 Observation:
These observations indicate that certain restaurants have a strong specialization in specific cuisines, and they tend to receive a high volume of orders for those cuisines. This could suggest a strong customer preference for these restaurants and their specialized cuisines.
Moreover top restaurants dont have much variety in cuisine type, they are specialized in one or two cuisine types.
Investigate the distribution of food preparation times across restaurants, which can help identify restaurants with consistently higher or lower preparation times & detect if there are any outliers (ie exceptions) in preparation times for specific restaurants.
This can drive potential insights into operational & efficiency issues
x = 'restaurant_name'
y = 'food_preparation_time'
df[x].nunique(), df[y].nunique()
(178, 16)
NOTE: As there are too many restaurants, we will pick top performing restaurants (ie which recieves more orders comparatively)
# PREP (Data Preparation)
# Filter for top 10 restaurants by order count
famous_restaurants = df['restaurant_name'].value_counts().head(10).index
filtered_df = df[df['restaurant_name'].isin(famous_restaurants)]
print("Top 10 Restaurants by Order Count :")
print(famous_restaurants)
Top 10 Restaurants by Order Count :
Index(['Shake Shack', 'The Meatball Shop', 'Blue Ribbon Sushi',
'Blue Ribbon Fried Chicken', 'Parm', 'RedFarm Broadway',
'RedFarm Hudson', 'TAO', 'Han Dynasty',
'Blue Ribbon Sushi Bar & Grill'],
dtype='object')
# Plot
sns.boxplot(data=filtered_df, x='restaurant_name', y='food_preparation_time')
plt.title('Food Preparation Time Distribution for Top 10 Restaurants')
plt.xlabel('Restaurant Name')
plt.ylabel('Food Preparation Time (minutes)')
plt.xticks(rotation=45)
plt.show()
🔍 Insights:
Food preparation time distribution indicates that all restaurants show consistency in preparing food. & Seems like all of them following conventional steps to prepare food & hence having almost similar set of duration, likewise.
# Plot 2 (Horizontal Bar Plot)
restaurant_avg_prep_time = filtered_df.groupby('restaurant_name')['food_preparation_time'].mean().sort_values()
# plot (horizontal bars)
restaurant_avg_prep_time.plot(kind='barh')
# aesthetics
plt.title('Average Food Preparation Time for Top 10 Restaurants')
plt.xlabel('Average Preparation Time (minutes)')
plt.ylabel('Restaurant Name')
# note: need to provide grid property here manually
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
# Plot 3
sns.violinplot(data=filtered_df, x='restaurant_name', y='food_preparation_time', inner="box")
plt.title('Density of Food Preparation Time for Top 10 Restaurants')
plt.xlabel('Restaurant Name')
plt.ylabel('Food Preparation Time (minutes)')
plt.xticks(rotation=45)
plt.show()
# Plot 4
sns.pointplot(
data=filtered_df,
x="food_preparation_time",
y="restaurant_name",
palette="Set2",
join=False,
markers="o",
linestyles="--",
)
plt.title("Average Food Preparation Time per Restaurant")
plt.xlabel("Food Preparation Time (minutes)")
plt.ylabel("Restaurant Name")
plt.grid(axis="x", linestyle="--", alpha=0.7)
plt.show()
# get stats for food preparation time for top 10 restaurants
restaurant_food_preparation_time = filtered_df.groupby('restaurant_name')['food_preparation_time'].describe()
restaurant_food_preparation_time
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| restaurant_name | ||||||||
| Blue Ribbon Fried Chicken | 96.0 | 27.197917 | 4.740908 | 20.0 | 23.00 | 27.0 | 31.00 | 35.0 |
| Blue Ribbon Sushi | 119.0 | 27.924370 | 4.758863 | 20.0 | 24.00 | 28.0 | 32.00 | 35.0 |
| Blue Ribbon Sushi Bar & Grill | 44.0 | 26.295455 | 4.100933 | 20.0 | 22.75 | 26.0 | 30.00 | 34.0 |
| Han Dynasty | 46.0 | 27.413043 | 4.833108 | 20.0 | 23.00 | 27.0 | 31.75 | 35.0 |
| Parm | 68.0 | 27.308824 | 4.351213 | 20.0 | 23.00 | 27.0 | 30.25 | 35.0 |
| RedFarm Broadway | 59.0 | 27.593220 | 4.464877 | 20.0 | 24.00 | 27.0 | 31.50 | 35.0 |
| RedFarm Hudson | 55.0 | 27.163636 | 4.728807 | 20.0 | 23.00 | 27.0 | 31.00 | 35.0 |
| Shake Shack | 219.0 | 27.945205 | 4.744549 | 20.0 | 24.00 | 28.0 | 32.00 | 35.0 |
| TAO | 49.0 | 26.775510 | 4.431259 | 20.0 | 23.00 | 26.0 | 31.00 | 35.0 |
| The Meatball Shop | 132.0 | 27.181818 | 4.596411 | 20.0 | 23.00 | 28.0 | 31.00 | 35.0 |
# Just to see prep time in sorted order refering this
restaurant_avg_prep_time
restaurant_name Blue Ribbon Sushi Bar & Grill 26.295455 TAO 26.775510 RedFarm Hudson 27.163636 The Meatball Shop 27.181818 Blue Ribbon Fried Chicken 27.197917 Parm 27.308824 Han Dynasty 27.413043 RedFarm Broadway 27.593220 Blue Ribbon Sushi 27.924370 Shake Shack 27.945205 Name: food_preparation_time, dtype: float64
🔍 Observation:
Overall, the data indicates that the top 10 restaurants maintain a high level of consistency in their food preparation times, with some variability due to order complexity or volume. These observations suggest that while there is some variability in food preparation times among the top 10 restaurants, the differences are not substantial. Most restaurants have a standardized process that ensures consistent preparation times, which can lead to predictable and reliable service for customers.
(Potential insights into restaurant operations based on cuisine-specific demands)
x = 'cuisine_type'
y = 'food_preparation_time'
df[x].nunique(), df[y].nunique()
(14, 16)
# Plot 1
sns.violinplot(data=df, x='cuisine_type', y='food_preparation_time', inner="box")
plt.title('Density & Distibution of Food Preparation Time for Cuisines')
plt.xlabel('Cuisines')
plt.ylabel('Food Preparation Time (minutes)')
plt.xticks(rotation=45)
plt.show()
# Plot 2
# Plot the strip plot
sns.stripplot(
data=df,
x='food_preparation_time',
y='cuisine_type',
jitter=True,
alpha=0.7,
palette='Set1',
)
plt.title("Food Preparation Time For Cusines")
plt.xlabel("Food Preparation Time (minutes)")
plt.ylabel("Cusines")
plt.grid(axis="x", linestyle="--", alpha=0.7)
plt.show()
# Get Descriptive Stats about food preparation time for each cuisine type
cuisine_prep_time_stats = df.groupby('cuisine_type')['food_preparation_time'].describe()
cuisine_prep_time_stats
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| cuisine_type | ||||||||
| American | 584.0 | 27.440068 | 4.658474 | 20.0 | 23.00 | 27.0 | 31.0 | 35.0 |
| Chinese | 215.0 | 27.511628 | 4.758525 | 20.0 | 23.50 | 27.0 | 32.0 | 35.0 |
| French | 18.0 | 26.888889 | 5.039712 | 21.0 | 22.50 | 25.5 | 31.0 | 35.0 |
| Indian | 73.0 | 27.109589 | 4.620491 | 20.0 | 23.00 | 27.0 | 31.0 | 35.0 |
| Italian | 298.0 | 27.483221 | 4.519474 | 20.0 | 23.00 | 28.0 | 31.0 | 35.0 |
| Japanese | 470.0 | 27.510638 | 4.604496 | 20.0 | 24.00 | 27.5 | 32.0 | 35.0 |
| Korean | 13.0 | 25.461538 | 3.971049 | 20.0 | 23.00 | 25.0 | 26.0 | 33.0 |
| Mediterranean | 46.0 | 27.000000 | 4.908043 | 20.0 | 23.00 | 27.0 | 31.0 | 35.0 |
| Mexican | 77.0 | 26.727273 | 4.604450 | 20.0 | 23.00 | 26.0 | 31.0 | 35.0 |
| Middle Eastern | 49.0 | 26.673469 | 4.012417 | 20.0 | 24.00 | 27.0 | 30.0 | 34.0 |
| Southern | 17.0 | 27.588235 | 5.523346 | 20.0 | 24.00 | 26.0 | 33.0 | 35.0 |
| Spanish | 12.0 | 26.916667 | 4.501683 | 20.0 | 23.75 | 27.5 | 28.5 | 35.0 |
| Thai | 19.0 | 27.315789 | 5.498006 | 21.0 | 22.00 | 28.0 | 32.5 | 35.0 |
| Vietnamese | 7.0 | 25.714286 | 5.154748 | 20.0 | 21.50 | 25.0 | 29.5 | 33.0 |
restaurant_prep_time = df.groupby('cuisine_type')['food_preparation_time'].median().sort_values()
restaurant_prep_time
cuisine_type Korean 25.0 Vietnamese 25.0 French 25.5 Mexican 26.0 Southern 26.0 American 27.0 Chinese 27.0 Indian 27.0 Mediterranean 27.0 Middle Eastern 27.0 Japanese 27.5 Spanish 27.5 Italian 28.0 Thai 28.0 Name: food_preparation_time, dtype: float64
🔍 Observation:
These observations suggest that the preparation times for different cuisines are fairly consistent, with only slight variations. This consistency can lead to predictable and reliable service for customers, regardless of the cuisine type.
(This can help identify if there’s a relationship between the cost of an order and the time taken to prepare the food.) For example, do higher-cost orders generally require more preparation time?
x = 'cost_of_the_order'
y = 'food_preparation_time'
# Get correlation between 2 variables
df[x].corr(df[y])
0.04152747282774475
df[[x, y]].corr()
| cost_of_the_order | food_preparation_time | |
|---|---|---|
| cost_of_the_order | 1.000000 | 0.041527 |
| food_preparation_time | 0.041527 | 1.000000 |
sns.lmplot(data=df,
x='food_preparation_time',
y='cost_of_the_order',
height=6, aspect=2.5,
scatter_kws={'alpha': 0.6}, # Transparency for scatter points
line_kws={'linewidth': 1.5, 'color': 'red'}) # Thicker regression lines
plt.title('Cost of Order vs Food Preparation Time', y=1.03)
plt.xlabel('Food Preparation Time (minutes)')
plt.ylabel('Cost of Order')
plt.grid(alpha=0.3)
plt.show()
sns.kdeplot(
data=df,
x="food_preparation_time",
y="cost_of_the_order",
)
plt.title("Cost of Order vs Food Preparation Time")
plt.xlabel("Food Preparation Time (minutes)")
plt.ylabel("Cost of Order")
plt.grid(alpha=0.3)
plt.show()
🔍 Insights:
At high level, it seems that the preparation time for food items does not strongly influence their cost. i.e High-cost items do not necessarily take longer to prepare, nor do low-cost items tend to be quicker.
Hence maybe cuisine type & restaurant policies, might play a more significant role in determinign cost.
This can reveal demand & patterns for specific cuisines on different days.
x = 'cuisine_type'
y = 'cost_of_the_order'
z = 'day_of_the_week'
cost_of_cuisine_by_day = df.groupby([x, z])[y].mean()
cost_of_cuisine_by_day
cuisine_type day_of_the_week
American Weekday 15.308343
Weekend 16.731735
Chinese Weekday 17.147692
Weekend 16.036442
French Weekday 20.866000
Weekend 19.381538
Indian Weekday 17.162500
Weekend 16.800816
Italian Weekday 16.890110
Weekend 16.211449
Japanese Weekday 16.368074
Weekend 16.278925
Korean Weekday 12.180000
Weekend 14.332727
Mediterranean Weekday 14.040000
Weekend 16.102500
Mexican Weekday 16.555833
Weekend 17.103962
Middle Eastern Weekday 19.832941
Weekend 18.282812
Southern Weekday 19.226667
Weekend 19.340909
Spanish Weekday 12.130000
Weekend 19.618182
Thai Weekday 15.462500
Weekend 20.206667
Vietnamese Weekday 11.916667
Weekend 13.607500
Name: cost_of_the_order, dtype: float64
# convert to table form by unstack
# NOTE: second level (ie day_of_the_week) will become column
cost_of_cuisine_by_day_tbl = cost_of_cuisine_by_day.unstack()
cost_of_cuisine_by_day_tbl
| day_of_the_week | Weekday | Weekend |
|---|---|---|
| cuisine_type | ||
| American | 15.308343 | 16.731735 |
| Chinese | 17.147692 | 16.036442 |
| French | 20.866000 | 19.381538 |
| Indian | 17.162500 | 16.800816 |
| Italian | 16.890110 | 16.211449 |
| Japanese | 16.368074 | 16.278925 |
| Korean | 12.180000 | 14.332727 |
| Mediterranean | 14.040000 | 16.102500 |
| Mexican | 16.555833 | 17.103962 |
| Middle Eastern | 19.832941 | 18.282812 |
| Southern | 19.226667 | 19.340909 |
| Spanish | 12.130000 | 19.618182 |
| Thai | 15.462500 | 20.206667 |
| Vietnamese | 11.916667 | 13.607500 |
# Plot 1
sns.barplot(data=df, x=x, y=y, hue=z)
plt.title('Average Cost of Order by Cuisine Type and Day of the Week')
plt.xlabel('Cuisine Type')
plt.ylabel('Average Cost of Order')
plt.xticks(rotation=45)
plt.legend(title='Day of the Week', loc='upper left')
plt.show()
# Plot 2
sns.boxplot(data=df, x=x, y=y, hue=z)
plt.title('Distribution Summary of Order Cost by Cuisine Type and Day of the Week')
plt.xlabel('Cuisine Type')
plt.ylabel('Average Cost of Order')
plt.xticks(rotation=45)
plt.legend(title='Day of the Week', loc='upper left')
plt.show()
spanish_weekday_orders = df[(df['cuisine_type'] == 'Spanish') & (df['day_of_the_week'] == 'Weekday')]
spanish_weekday_orders
| order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
|---|---|---|---|---|---|---|---|---|---|
| 1138 | 1477023 | 99288 | Boqueria | Spanish | 12.13 | Weekday | NaN | 33 | 28 |
korean_weekday_orders = df[(df['cuisine_type'] == 'Korean') & (df['day_of_the_week'] == 'Weekday')]
korean_weekday_orders
| order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
|---|---|---|---|---|---|---|---|---|---|
| 1449 | 1477461 | 241719 | Kori Restaurant and Bar | Korean | 12.18 | Weekday | 4.0 | 26 | 26 |
| 1859 | 1476790 | 141732 | Cho Dang Gol | Korean | 12.18 | Weekday | 5.0 | 20 | 26 |
🔍 Insights:
People in New-York don't like to consume Spanish & Korean dishes during weekdays
🔍 Observation:
(Determine how preparation time and cost impact customer ratings. The goal is to explore correlations and trends, which can highlight key drivers of customer satisfaction.)
x = 'cost_of_the_order'
y = 'preparation_time'
z = 'rating'
# Plot
g = sns.jointplot(
data=df,
x="cost_of_the_order",
y="food_preparation_time",
kind="reg", # Regression plot
height=8, # Figure size
space=0.5, # Space between main plot and marginal plots
marginal_kws={"bins": 20, "fill": True}, # Styling marginal histograms
color="teal",
)
plt.suptitle(
"Relationship Between Cost of Order and Food Preparation Time", y=1.02, fontsize=14
)
g.figure.set_figwidth(15)
g.figure.set_figheight(6)
plt.show()
🔍 Insights:
The distribution of food preparation times is fairly uniform, with no significant peaks or concentrations at any particular time range. This indicates that preparation times are consistent across all orders, with no noticeable trend or bias toward faster or slower preparations
The distribution of order costs is right-skewed, meaning that most orders are relatively inexpensive, with fewer orders having significantly higher costs. This suggests that while the majority of customers are making lower-cost orders, there are a few large or multiple-item orders that are driving the higher cost values
g = sns.jointplot(
data=df,
x="cost_of_the_order",
y="food_preparation_time",
kind="hex", # Scatter plot
height=7, # Figure size
space=0.5, # Space between main plot and marginal plots
)
plt.suptitle(
"Relationship Between Cost of Order and Food Preparation Time", y=1.02, fontsize=14
)
g.figure.set_figwidth(15)
g.figure.set_figheight(7)
plt.show()
🔍 Insights:
The hex plot reveals that the majority of orders, especially the higher frequency ones, tend to have lower costs and relatively shorter food preparation times. However, there are some instances of low-cost orders that took significantly longer to prepare, suggesting that while most orders follow this trend, there are exceptions with longer preparation times for certain cost-effective items.
# Plot
g = sns.jointplot(
data=df,
x="cost_of_the_order",
y="food_preparation_time",
kind="scatter", # Scatter plot
height=6, # Figure size
hue="rating",
space=0.5, # Space between main plot and marginal plots
color="teal",
)
plt.suptitle(
"Relationship Between Cost of Order and Food Preparation Time", y=1.02, fontsize=14
)
g.figure.set_figwidth(15)
g.figure.set_figheight(6)
plt.show()
🔍 Insights:
The scatter plot reveals that orders across all ratings are spread out across both preparation time and order cost. This indicates that there's no clear pattern linking the rating to either the time taken for preparation or the cost of the order. In other words, the ratings don't seem to depend on how long the food takes to prepare or how much it costs. Note: here we are ignoring the orders where ratings are not provided
Orders with a rating of 5 show some unusual patterns compared to those rated 3 and 4. Specifically, these orders tend to have a higher or more varied cost and food preparation time, which isn’t as consistent in the lower-rated orders. This suggests that while customers may give a perfect rating, these orders might involve more complex or expensive meals, leading to less predictability in how much they cost or how long they take to prepare
# Plot
sns.swarmplot(
data=df,
x='rating',
y='food_preparation_time',
hue='rating',
dodge=True,
)
plt.title('Food Preparation Time vs Rating')
plt.xlabel('Rating')
plt.ylabel('Food Preparation Time')
plt.legend(title='Rating')
plt.show()
🔍 Insights:
We can see that many people have rated 5 regardless of Food Preparation Time. Also similar behavior is seen for rating 4 too
# Plot
g = sns.FacetGrid(df, col='rating', col_wrap=3, height=4, aspect=1.2, sharex=False, sharey=False)
g.map(sns.scatterplot, 'cost_of_the_order', 'food_preparation_time', alpha=0.7, color='teal')
g.set_titles("Rating: {col_name}")
g.set_axis_labels("Cost of the Order", "Food Preparation Time")
g.figure.suptitle('Cost vs Food Preparation Time by Rating', y=1.02, fontsize=14)
plt.show()
🔍 Observation:
Hence From the scatter plot, it appears that customers who rated their orders 4 or 5 tend to order lower-cost items, regardless of the preparation time. On the other hand, customers who rated their orders with a 3 seem to prioritize factors other than cost, indicating that they may be less concerned with the price and more focused on other aspects of their order, such as quality or service.
# Plot
fig = plt.figure(figsize=(15, 6))
ax = fig.add_subplot(111, projection='3d')
# Scatter plot
sc = ax.scatter(
df['cost_of_the_order'],
df['food_preparation_time'],
df['rating'],
c=df['rating'], cmap='viridis', alpha=0.8
)
# Axis labels
ax.set_xlabel('Cost of the Order')
ax.set_ylabel('Food Preparation Time')
ax.set_zlabel('Rating')
# Colorbar
cbar = plt.colorbar(sc, ax=ax, pad=0.1)
cbar.set_label('Rating')
plt.title('3D Scatter: Cost, Preparation Time, and Rating')
plt.show()
🔍 Insights:
In the 3D plot, the points representing orders with a 5-star rating are more evenly distributed across both food preparation time and cost, indicating a uniform pattern in customer choices. In contrast, the 3-star rating plane shows a noticeable gap between the clustered points, suggesting that these customers might have more varied preferences or that there is a specific factor influencing their decisions differently than those who gave higher ratings.
# Plot
# Calculate correlation matrix
corr_matrix = df[['cost_of_the_order', 'food_preparation_time', 'rating']].corr()
# Plot heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='Blues', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap of Cost, Food Preparation Time, and Rating')
plt.show()
🔍 Observation:
The correlations between cost, preparation time, and rating are all very weak, indicating that these factors are largely independent of each other. This suggests that other factors, such as food quality, service, and customer expectations, may play a more significant role in determining customer satisfaction and ratings.
# Left Blank Intentionally !! (For Gap)
# Snity check
# Seems good as rating column is of numeric type (as converted)
df.head()
| order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1477147 | 337525 | Hangawi | Korean | 30.75 | Weekend | NaN | 25 | 20 |
| 1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.08 | Weekend | NaN | 25 | 23 |
| 2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5.0 | 23 | 28 |
| 3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3.0 | 25 | 15 |
| 4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4.0 | 25 | 24 |
# ? Assumption - at this point column 'rating' has no missing values & it's type is numeric
# 2. Group by restaurant name and calculate the count of ratings and the average rating
restaurant_stats = df.groupby('restaurant_name')['rating'].agg(['count', 'mean'])
# 3. Filter restaurants based on the criteria: count > 50 and mean > 4
eligible_restaurants = restaurant_stats[(restaurant_stats['count'] > 50) & (restaurant_stats['mean'] > 4)]
# 4. Display the eligible restaurants
tableit_df(eligible_restaurants, show_index=True)
+---------------------------+-------+--------------------+ | restaurant_name | count | mean | +---------------------------+-------+--------------------+ | Blue Ribbon Fried Chicken | 64.0 | 4.328125 | | Blue Ribbon Sushi | 73.0 | 4.219178082191781 | | Shake Shack | 133.0 | 4.2781954887218046 | | The Meatball Shop | 84.0 | 4.511904761904762 | +---------------------------+-------+--------------------+
# ! Printing raw as well (as if above tabular doesnt work i.e missing module case or display incosistency)
eligible_restaurants
| count | mean | |
|---|---|---|
| restaurant_name | ||
| Blue Ribbon Fried Chicken | 64 | 4.328125 |
| Blue Ribbon Sushi | 73 | 4.219178 |
| Shake Shack | 133 | 4.278195 |
| The Meatball Shop | 84 | 4.511905 |
print('Eligible Restaurants for Promotional Offer:')
eligible_restaurants.index.to_list()
Eligible Restaurants for Promotional Offer:
['Blue Ribbon Fried Chicken', 'Blue Ribbon Sushi', 'Shake Shack', 'The Meatball Shop']
🔍 Observation:
The following restaurants meet the criteria of having more than 50 ratings and an average rating greater than 4:
4.33 (64 ratings)4.22 (73 ratings)4.28 (133 ratings)4.51 (84 ratings)These restaurants are eligible for the promotional offer as they consistently receive high ratings from a significant number of customers.
def calculate_revenue(row):
''' Calculate revenue based on cost of order'''
# ? NOTE : the unit of currency is dollar
match row['cost_of_the_order']:
case cost if cost > 20: # more than 20 dollar
return cost * 0.25
case cost if cost > 5: # more than 5 dollar
return cost * 0.15
case _: # less than or equal to 5 dollar
return 0
# Apply the function to each row and calculate the total revenue
orders_revenue = df.apply(calculate_revenue, axis=1)
# Calculate total net revenue generated
total_revenue = orders_revenue.sum()
# Display the total revenue
print(f"Total Net Revenue Generated by the Company across all orders is: ${total_revenue:.2f}")
Total Net Revenue Generated by the Company across all orders is: $6166.30
🔍 Observation:
Total Net Revenue Generated by the Company across all orders is: $6166.30
# Calculate total time = (food preparation time + delivery time)
order_total_delivery_time = df['food_preparation_time'] + df['delivery_time']
# Find percentage of orders taking more than 60 minutes
percentage_orders_above_60min = (order_total_delivery_time > 60).mean() * 100
# Result
print(f"Percentage of orders taking more than 60 minutes from the time the order is placed is: {format_pct(percentage_orders_above_60min)}")
Percentage of orders taking more than 60 minutes from the time the order is placed is: 10.54%
🔍 Observation:
Percentage of orders taking more than 60 minutes from the time the order is placed is: 10.54%
mean_delivery_time_by_day = df.groupby('day_of_the_week')['delivery_time'].mean()
tableit(mean_delivery_time_by_day)
+-----------------+---------+ | delivery_time | value | +=================+=========+ | Weekday | 28.34 | +-----------------+---------+ | Weekend | 22.47 | +-----------------+---------+
# ! Printing raw as well (in case above tabular doesnt work i.e missing module case or display incosistency)
mean_delivery_time_by_day
day_of_the_week Weekday 28.340037 Weekend 22.470022 Name: delivery_time, dtype: float64
🔍 Observation:
The mean delivery time on weekdays is 28.34 minutes, whereas on weekends it is 22.47 minutes. This shows that delivery times are generally longer on weekdays compared to weekends, with a difference of approximately 5.87 minutes. This could indicate that there are more delays or possibly higher demand for deliveries during weekdays.
# Just an analysis for curiosity of Indian Dishes
indian_orders = (df['cuisine_type'] == 'Indian').sum()
indian_orders_pct = indian_orders / total_orders * 100
format_pct(indian_orders_pct)
'3.85%'
Customer Preferences
Order Ratings
Order and Delivery Patterns
Customer Behavior
Cuisines and Spending
Geographical Insights
Business Implications
Cuisine Focus
Target Popular Cuisines
Delivery Time & Efficiency
Enhance Delivery Speed, esp during weekdays
Operational Adjustments
Weekend Order Management
Customer Retention & Feedback
Boosting Repeat Customers
Encouraging Ratings and Feedback
Promotional Campaigns
Cuisine-Specific Campaigns
Enhancing Customer Experience
Small Gestures to Improve Ratings
⚡ By focusing on these strategic areas, businesses can not only attract more customers but also improve customer satisfaction, retention, and operational efficiency, leading to sustained growth